15,931 research outputs found
Online and offline heuristics for inferring hierarchies of repetitions in sequences
Hierarchical dictionary-based compression schemes form a grammar for a text by replacing each repeated string with a production rule. While such schemes usually operate online, making a replacement as soon as repetition is detected, offline operation permits greater freedom in choosing the order of replacement. In this paper, we compare the online method with three offline heuristics for selecting the next substring to replace: longest string first, most common string first, and the string that minimized the size of the grammar locally. Surprisingly, two of the offline techniques, like the online method, run in time linear in the size of the input. We evaluate each technique on artificial and natural sequences. In general, the locally-most-compressive heuristic performs best, followed by most frequent, the online technique, and, lagging by some distance, the longest-first technique
Detecting sequential structure
Programming by demonstration requires detection and analysis of sequential patterns in a userâs input, and the synthesis of an appropriate structural model that can be used for prediction. This paper describes SEQUITUR, a scheme for inducing a structural description of a sequence from a single example. SEQUITUR integrates several different inference techniques: identification of lexical subsequences or vocabulary elements, hierarchical structuring of such subsequences, identification of elements that have equivalent usage patterns, inference of programming constructs such as looping and branching, generalisation by unifying grammar rules, and the detection of procedural substructure., Although SEQUITUR operates with abstract sequences, a number of concrete illustrations are provided
Steady-state, effective-temperature dynamics in a glassy material
We present an STZ-based analysis of numerical simulations by Haxton and Liu
(HL). The extensive HL data sharply test the basic assumptions of the STZ
theory, especially the central role played by the effective disorder
temperature as a dynamical state variable. We find that the theory survives
these tests, and that the HL data provide important and interesting constraints
on some of its specific ingredients. Our most surprising conclusion is that,
when driven at various constant shear rates in the low-temperature glassy
state, the HL system exhibits a classic glass transition, including
super-Arrhenius behavior, as a function of the effective temperature.Comment: 9 pages, 6 figure
Recommended from our members
Increased human pathogenic potential of Escherichia coli from polymicrobial urinary tract infections in comparison to isolates from monomicrobial culture samples
The current diagnostic standard procedure outlined by the Health Protection Agency for urinary tract infections (UTIs) in clinical laboratories does not report bacteria isolated from samples containing three or more different bacterial species. As a result many UTIs go unreported and untreated, particularly in elderly patients, where polymicrobial UTI samples are especially prevalent. This study reports the presence of the major uropathogenic species in mixed culture urine samples from elderly patients, and of resistance to front-line antibiotics, with potentially increased levels of resistance to ciprofloxacin and trimethoprim. Most importantly, the study highlights that Escherichia coli present in polymicrobial UTI samples are statistically more invasive (P<0.001) in in vitro epithelial cell infection assays than those isolated from monomicrobial culture samples. In summary, the results of this study suggest that the current diagnostic standard procedure for polymicrobial UTI samples needs to be reassessed, and that E. coli present in polymicrobial UTI samples may pose an increased risk to human health
Generalized Modeling Approaches to Risk Adjustment of Skewed Outcomes Data
There are two broad classes of models used to address the econometric problems caused by skewness in data commonly encountered in health care applications: (1) transformation to deal with skewness (e.g., OLS on ln(y)); and (2) alternative weighting approaches based on exponential conditional models (ECM) and generalized linear model (GLM) approaches. In this paper, we encompass these two classes of models using the three parameter generalized gamma (GGM) distribution, which includes several of the standard alternatives as special cases OLS with a normal error, OLS for the log normal, the standard gamma and exponential with a log link, and the Weibull. Using simulation methods, we find the tests of identifying distributions to be robust. The GGM also provides a potentially more robust alternative estimator to the standard alternatives. An example using inpatient expenditures is also analyzed.
Extracting text from PostScript
We show how to extract plain text from PostScript files. A textual scan is inadequate because PostScript interpreters can generate characters on the page that do not appear in the source file. Furthermore, word and line breaks are implicit in the graphical rendition, and must be inferred from the positioning of word fragments. We present a robust technique for extracting text and recognizing words and paragraphs. The method uses a standard PostScript interpreter but redefines several PostScript operators, and simple heuristics are employed to locate word and line breaks. The scheme has been used to create a full-text index, and plain-text versions, of 40,000 technical reports (34 Gbyte of PostScript). Other text-extraction systems are reviewed: none offer the same combination of robustness and simplicity
Recommended from our members
Transposon mutagenesis in a hyper-invasive clinical isolate of Campylobacter jejuni reveals a number of genes with potential roles in invasion
Transposon mutagenesis has been applied to a hyper-invasive clinical isolate of Campylobacter jejuni, 01/51. A random transposon mutant library was screened in an in vitro assay of invasion and 26 mutants with a significant reduction in invasion were identified. Given that the invasion potential of C. jejuni is relatively poor compared to other enteric pathogens, the use of a hyper-invasive strain was advantageous as it greatly facilitated the identification of mutants with reduced invasion. The location of the transposon insertion in 23 of these mutants has been determined; all but three of the insertions are in genes also present in the genome-sequenced strain NCTC 11168. Eight of the mutants contain transposon insertions in one region of the genome (âŒ14 kb), which when compared with the genome of NCTC 11168 overlaps with one of the previously reported plasticity regions and is likely to be involved in genomic variation between strains. Further characterization of one of the mutants within this region has identified a gene that might be involved in adhesion to host cells
Scaling and Universality in the Counterion-Condensation Transition at Charged Cylinders
We address the critical and universal aspects of counterion-condensation
transition at a single charged cylinder in both two and three spatial
dimensions using numerical and analytical methods. By introducing a novel
Monte-Carlo sampling method in logarithmic radial scale, we are able to
numerically simulate the critical limit of infinite system size (corresponding
to infinite-dilution limit) within tractable equilibration times. The critical
exponents are determined for the inverse moments of the counterionic density
profile (which play the role of the order parameters and represent the inverse
localization length of counterions) both within mean-field theory and within
Monte-Carlo simulations. In three dimensions (3D), correlation effects
(neglected within mean-field theory) lead to an excessive accumulation of
counterions near the charged cylinder below the critical temperature
(condensation phase), while surprisingly, the critical region exhibits
universal critical exponents in accord with the mean-field theory. In two
dimensions (2D), we demonstrate, using both numerical and analytical
approaches, that the mean-field theory becomes exact at all temperatures
(Manning parameters), when number of counterions tends to infinity. For finite
particle number, however, the 2D problem displays a series of peculiar singular
points (with diverging heat capacity), which reflect successive de-localization
events of individual counterions from the central cylinder. In both 2D and 3D,
the heat capacity shows a universal jump at the critical point, and the energy
develops a pronounced peak. The asymptotic behavior of the energy peak location
is used to locate the critical temperature, which is also found to be universal
and in accordance with the mean-field prediction.Comment: 31 pages, 16 figure
- âŠ